AITopics | descriptive text

Collaborating Authors

descriptive text

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis

Kim, Minsu, Ma, Pingchuan, Chen, Honglie, Petridis, Stavros, Pantic, Maja

arXiv.org Artificial IntelligenceMay-27-2025

This paper explores multi-modal controllable Text-to-Speech Synthesis (TTS) where the voice can be generated from face image, and the characteristics of output speech (e.g., pace, noise level, distance, tone, place) can be controllable with natural text description. Specifically, we aim to mitigate the following three challenges in face-driven TTS systems. 1) To overcome the limited audio quality of audio-visual speech corpora, we propose a training method that additionally utilizes high-quality audio-only speech corpora. 2) To generate voices not only from real human faces but also from artistic portraits, we propose augmenting the input face image with stylization. 3) To consider one-to-many possibilities in face-to-voice mapping and ensure consistent voice generation at the same time, we propose to first employ sampling-based decoding and then use prompting with generated speech samples. Experimental results validate the proposed model's effectiveness in face-driven voice synthesis.

artificial intelligence, face image, speech synthesis, (16 more...)

arXiv.org Artificial Intelligence

2505.18972

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)

Add feedback

A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks

Liang, Chia Xin, Tian, Pu, Yin, Caitlyn Heqi, Yua, Yao, An-Hou, Wei, Ming, Li, Wang, Tianyang, Bi, Ziqian, Liu, Ming

arXiv.org Artificial IntelligenceDec-8-2024

This survey and application guide to multimodal large language models(MLLMs) explores the rapidly developing field of MLLMs, examining their architectures, applications, and impact on AI and Generative Models. Starting with foundational concepts, we delve into how MLLMs integrate various data types, including text, images, video and audio, to enable complex AI systems for cross-modal understanding and generation. It covers essential topics such as training methods, architectural components, and practical applications in various fields, from visual storytelling to enhanced accessibility. Through detailed case studies and technical analysis, the text examines prominent MLLM implementations while addressing key challenges in scalability, robustness, and cross-modal learning. Concluding with a discussion of ethical considerations, responsible AI development, and future directions, this authoritative resource provides both theoretical frameworks and practical insights. It offers a balanced perspective on the opportunities and challenges in the development and deployment of MLLMs, and is highly valuable for researchers, practitioners, and students interested in the intersection of natural language processing and computer vision.

large language model, machine learning, natural language, (25 more...)

arXiv.org Artificial Intelligence

2411.06284

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Indiana (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(5 more...)

Genre:

Workflow (1.00)
Research Report > Promising Solution (1.00)
Overview (1.00)
(2 more...)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Information Technology > Services (1.00)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(8 more...)

Add feedback

ProtDAT: A Unified Framework for Protein Sequence Design from Any Protein Text Description

Guo, Xiao-Yu, Li, Yi-Fan, Liu, Yuan, Pan, Xiaoyong, Shen, Hong-Bin

arXiv.org Artificial IntelligenceDec-5-2024

Protein design has become a critical method in advancing significant potential for various applications such as drug development and enzyme engineering. However, protein design methods utilizing large language models with solely pretraining and fine-tuning struggle to capture relationships in multi-modal protein data. To address this, we propose ProtDAT, a de novo fine-grained framework capable of designing proteins from any descriptive protein text input. ProtDAT builds upon the inherent characteristics of protein data to unify sequences and text as a cohesive whole rather than separate entities. It leverages an innovative multi-modal cross-attention, integrating protein sequences and textual information for a foundational level and seamless integration. Experimental results demonstrate that ProtDAT achieves the state-of-the-art performance in protein sequence generation, excelling in rationality, functionality, structural similarity, and validity. On 20,000 text-sequence pairs from Swiss-Prot, it improves pLDDT by 6%, TM-score by 0.26, and reduces RMSD by 1.2 {\AA}, highlighting its potential to advance protein design.

protdat, protein sequence, sequence, (15 more...)

arXiv.org Artificial Intelligence

2412.04069

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Health & Safety > School Nutrition (0.30)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Domain-Independent Automatic Generation of Descriptive Texts for Time-Series Data

Dohi, Kota, Ito, Aoi, Purohit, Harsh, Nishida, Tomoya, Endo, Takashi, Kawaguchi, Yohei

arXiv.org Artificial IntelligenceSep-25-2024

Due to scarcity of time-series data annotated with descriptive texts, training a model to generate descriptive texts for time-series data is challenging. In this study, we propose a method to systematically generate domain-independent descriptive texts from time-series data. We identify two distinct approaches for creating pairs of time-series data and descriptive texts: the forward approach and the backward approach. By implementing the novel backward approach, we create the Temporal Automated Captions for Observations (TACO) dataset. Experimental results demonstrate that a contrastive learning based model trained using the TACO dataset is capable of generating descriptive texts for time-series data in novel domains.

dataset, descriptive text, time-series data, (15 more...)

arXiv.org Artificial Intelligence

2409.16647

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

How to Use Large Language Models for Text Coding: The Case of Fatherhood Roles in Public Policy Documents

Lupo, Lorenzo, Magnusson, Oscar, Hovy, Dirk, Naurin, Elin, Wängnerud, Lena

arXiv.org Artificial IntelligenceDec-15-2023

Recent advances in large language models (LLMs) like GPT-3 and GPT-4 have opened up new opportunities for text analysis in political science. They promise automation with better results and less programming. In this study, we evaluate LLMs on three original coding tasks of non-English political science texts, and we provide a detailed description of a general workflow for using LLMs for text coding in political science research. Our use case offers a practical guide for researchers looking to incorporate LLMs into their research on text analysis. We find that, when provided with detailed label definitions and coding examples, an LLM can be as good as or even better than a human annotator while being much faster (up to hundreds of times), considerably cheaper (costing up to 60% less than human coding), and much easier to scale to large amounts of text. Overall, LLMs present a viable option for most text coding projects.

human coder, language model, llm, (15 more...)

arXiv.org Artificial Intelligence

2311.11844

Country:

Europe > Sweden > Vaestra Goetaland > Gothenburg (0.14)
North America > Canada > Ontario > Toronto (0.04)
North America > United States (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Government (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How to start your adventure with AI art ?

#artificialintelligenceMay-5-2021, 14:05:40 GMT

Before I answer this question, two words of introduction. What you see in the picture below is a model that makes text look like an image -- AttnGAN (Attentional Generative Adversarial Networks). It changes descriptive texts into synthesized images. Thanks to its innovative generative network, AttnGAN it can synthesize small details in different subregions of an image, paying attention to the appropriate words in the natural language description. The text that answers the question of "what an AI is" is not a descriptive text, but let us see what our GAN will generate.

adventure, ai art, gpt model, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

BabyAI++: Towards Grounded-Language Learning beyond Memorization

Cao, Tianshi, Wang, Jingkang, Zhang, Yining, Manivasagam, Sivabalan

arXiv.org Artificial IntelligenceApr-15-2020

Despite success in many real-world tasks (e.g., robotics), reinforcement learning (RL) agents still learn from tabula rasa when facing new and dynamic scenarios. By contrast, humans can offload this burden through textual descriptions. Although recent works have shown the benefits of instructive texts in goal-conditioned RL, few have studied whether descriptive texts help agents to generalize across dynamic environments. To promote research in this direction, we introduce a new platform, BabyAI++, to generate various dynamic environments along with corresponding descriptive texts. Moreover, we benchmark several baselines inherited from the instruction following setting and develop a novel approach towards visually-grounded language learning on our platform. Extensive experiments show strong evidence that using descriptive texts improves the generalization of RL agents across environments with varied dynamics.

agent, descriptive text, image text, (17 more...)

arXiv.org Artificial Intelligence

2004.072

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.40)

Add feedback

Data Storytelling: Separating Fiction From Facts

Forbes - TechDec-21-2019, 05:46:11 GMT

Data is playing a larger role in day-to-day business conversations than ever before. The ability to communicate with data is now a necessity for business leaders, frontline employees, and everybody in between. People who may have easily avoided discussing data in the past are finding numbers being thrust upon them. When data is a foreign language to you, it can be frustrating to not understand what's being said or be able to use it effectively in communications with others. Not being conversant or fluent in data is quickly becoming a liability in today's fast-moving data economy.

audience, data story, data visualization, (11 more...)

Forbes - Tech

Technology: Information Technology > Artificial Intelligence > Natural Language (0.48)

Add feedback

Using Artificial Intelligence to Generate Alt Text on Images CSS-Tricks

#artificialintelligenceFeb-12-2019, 16:06:21 GMT

Web developers and content editors alike often forget or ignore one of the most important parts of making a website accessible and SEO performant: image alt text. If you regularly publish content on the web, then you know it can be tedious trying to come up with descriptive text. Sure, 5-10 images is doable. But what if we are talking about hundreds or thousands of images? Do you have the resources for that?

alt text, artificial intelligence, computer vision api, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Applied AI (0.40)

Add feedback